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Abstract 

The outcome of the British General Election to be held in just over one week’s 
time (May 7th, 2015) is widely regarded as the most difficult in living memory to 
predict. Current polls suggest that the two main parties (Conservative and Labour) 
are neck and neck but that there will be a landslide to the Scottish Nationalist Party 
with that party taking most of the constituencies in Scotland (some 50 out of 59 on 
the most recent forecast for Sunday April 26th). The Liberal Democrats are forecast 
to loose more than half their seats (56 to 24) and the fringe parties of whom the 
UK Independence Party is the biggest are simply unknown quantities. Much of 
this volatility relates to long-standing and deeply rooted cultural and nationalist 
attitudes that relate to geographical fault lines that have been present for 500 years 
or more but occasionally reveal themselves, at times like this. In this paper our 
purpose is to raise the notion that these fault lines are critical to thinking about 
regionalism, nationalism and the hierarchy of cities in Great Britain (excluding 
Northern Ireland). We use a percolation method [1] to reveal them that treats 
Britain as a giant cluster of related places each defined from the intersections of the 
road network at a very fine spatial scale (down to 50 metre resolution). We break 
this giant cluster into a detailed hierarchy of sub-clusters by successively reducing 
a distance threshold starting at 5 kilometres which first breaks off some of the 
Scottish Islands and then reveals the very distinct nations and regions that make 
up Britain, all the way down to the definition of the largest cities that appear when 
the threshold reaches 300 metres. We use these percolation clusters to apportion 
the 2010 voting pattern to a new hierarchy of constituencies based on these clusters, 
and this gives us a picture of how Britain might vote on purely geographical lines. 
We then examine this voting pattern which provides us with some sense of how 
important the new configuration of political parties might be to the election next 
week. 
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1 Introduction 


Long-standing cultural differences between the countries that comprise the United King¬ 
dom are part of the folklore of British politics but in the last twenty-five years they have 
reasserted themselves with a vengeance. The devolution of central powers from the gov¬ 
ernment in Westminster to Scotland and Wales were first initiated a generation ago while 
Northern Ireland has had periods from the 1970s when its traditional parliament, now 
assembly, has been entirely suspended. However these divisions have not really asserted 
themselves in terms of national voting until the last decade but it now looks as though 
Britain in its forthcoming general election on 7th May 2015 will finally divide itself along 
very deep and historically significant lines. 

In recent times, since the 1960s, the traditional two party system that had dominated 
politics since the early 20th century began to slowly fracture with the Liberals (now Lib¬ 
eral Democrats) reasserting themselves in south west England while also making inroads 
into somewhat conservative but relatively high income city suburbs and country towns. 
The Scottish Nationalist Party (SNP) began to gain more seats after devolution and re¬ 
cent local and European elections have led to a massive increase in their support that is 
widely seen as a sea change in the quest for Scottish independence. The SNP are forecast 
to more or less wipe out the traditional Labour Party in Scotland in the forthcoming 
general election. The Welsh nationalists Plaid Cymru have a much smaller base in Wales 
although it is entirely possible that they will gain seats in May while in Northern Ireland 
the traditional focus on conservatism and nationalism has led to a split with the Conser¬ 
vative Party in England to which it is traditionally allied. Northern Ireland politics have 
become much more inward looking. 

The two parties that have dominated national politics until quite recently the Con¬ 
servatives (the Tories) and the Labour party have also changed. The Blair and Brown 
governments, from 1997 to 2010, introduced the philosophy that was called New Labour, 
with the party much more geared to contemporary business ethics and deregulation. 
There has been a substantial reaction against this with the party reverting to a some¬ 
what more traditional stance but like the present coalition of Conservatives and Liberal 
Democrats espousing a pro-austerity stance in the wake of the Great Recession. The 
Liberals, of course, joined with the Conservatives in 2010 in a coalition, the first long 
lasting one for well over 70 years, and this has softened and blurred traditional thinking 
amongst the Conservatives and perhaps hardened and confused the philosophy of the 
Liberal Democrats. Add to this the emergence of the anti immigration and anti Euro¬ 
pean Union party, UKIP (the United Kingdom Independence Party) and the picture at 
first sight appears more confused that at any time since the Labour Party began to erode 
support for the Liberals in the late 19th century. 

Or is it? In our work on defining cities and regions within Britain (the UK less 
Northern Ireland), we are defining similar places by their connectivity to one another. 
Essentially we begin by treating Britain and all its places as a giant connected cluster 
that we define from the detailed road network that links places together, these places 
being defined at their most atomic level from the nodes where street segments inter¬ 
sect. The number of nodes of the graph that defines this giant cluster is of the order 
of ||W|| ~ 3.3 • 10 6 and the number of segments \\E\\ ~ 4 • 10 6 and when simplified as 
symmetric give an average degree for each node of ( k ) = 2.34. To decompose this net- 
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work into clusters with different degrees of connectivity, we begin by specifying a range 
based on the maximum segment length in the network, gradually relaxing this threshold, 
thus producing a hierarchy of clusters which is a unique decomposition of the British 
geographical space. Note that we use street intersections as the basis for our definition of 
any spatial unit so a parliamentary constituency, an area for which a politician is elected, 
is regarded as all the nodes and segments that fall uniquely into the physical area defining 
that space. 

The percolation method developed in [Tj starts with the entire cluster, setting the 
starting threshold at d = 5000m and the hierarchy of clusters emerges as we successively 
reduce this value. As we might expect from a casual knowledge of Britain, the more 
remote periphery will disconnect first but we are not able to anticipate the actual parti¬ 
tioning. When the threshold reaches 1.4 km, Scotland breaks off completely from the rest 
of England and Wales. The break is very geographically distinct, with the central low¬ 
lands dividing the country entirely from England excluding a few of the English border 
counties in the south. When the threshold falls to 900m, the Industrial North and West 
and Wales separates off from the South East and then, when it falls another 100 metres, 
the South West and South Wales become distinct from this division. The big cities then 
fall out of this when the threshold falls to 300 metres. The hierarchy is so clear that it is 
hard not to conclude without knowing anything else about these regions, that they are 
culturally and economically quite distinct. In the light of the debate about Scottish in¬ 
dependence and recent European election results, the geographical correlations with the 
predominant voting patterns are surprising and very clear. The influence of geographical 
boundaries on voting dynamics has already been studied in a few papers in the literature 
ME] but the notion of percolation and the way that it divides the geographical space 
vastly improves our understanding on the matter and allows us to quantify those geo¬ 
graphical units in an univocal manner. Although there are many explorations of voting 
behaviours using physical concepts particularly in opinion dynamics and voting structure 
[5j, and many that have used the concept of percolation applied to consensus decision 
making MS and even studied the spread of opinions through networks PHD] , none, 
as far as we are aware, are using a geographical percolation to explain voting patterns. 

To get an immediate idea of this kind of clustering, we refer you to our percolation 
movie where we successively partition the space by increasing the distance threshold 
systematically. Figure [l] shows the partition produced by the percolation at the thresholds 
that produce the most relevant divisions (300m, 800m, 900m, 1400m, 5000m). 

If we were to suggest that the Scottish cluster coincided with predominant SNP voters, 
the south west with Liberal Democrats, the industrial north with Labour, and the south 
east and shire counties with Conservatives, then one would not be far of the mark in 
what people have speculated this last six months about the forthcoming May election. 
UKIP do not show up in this physical decomposition and Wales blurs in the northern 
and western clusters for Plaid Cymru is largely a rural party, remote even within Wales. 
The local effects in urban areas where inner cities are more likely to vote Labour and the 
suburbs Conservative are picked up when we go down to the much finer city thresholds. 
It is this that encourages us that using percolation to detect the degree of isolationism 
as well as the strong concentration in the British population space of orientation and 
nearness to other areas are remarkably strong determinants of what people will vote. 
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Figure 1: Network percolations at the main thresholds, respectively, 300m, 

800m, 900m, 1400m and 5000m . To view the animated sequence of 

percolation distances please refer to: http://www.mechanicity.info/ 

percolation-clustering-of-the-uk-road-network/. 


2 Clustering of the percolation 

2.1 Preliminary definitions 

Let I be the set of intersections of the road network of Britain and let C be a set of 
parliamentary constituencies with n being the number of constituencies. Each C t C C 
is a set of intersections that belong to the i-th constituency such that (J ” =1 C, = C, 
Ci fl Cj = 0 for all i and j and [J™ =1 {x \ x G C t } = I. 

For each constituency C* we have a vector of voting behavior 5) = (tyi, • • •, v t j) which 
is composed of l elements, being l the number of political parties. We will refer to the £:-th 
element of the vector v l as v, j ; , which is equal to the number of votes that the political 
party k received in the constituency i. 

2.2 Obtaining the percolation clusters 

The technique to generate the percolation clusters is explained in detail in [T]. We will 
explain in the following paragraphs a summarised version of the technique to perform 
a network percolation, how it serves to generate a tree of the percolations, and how we 
can use that tree to assign a unique cluster for each intersection of the road network of 
the UK. Given a graph of the road network, where nodes represent intersections and the 
weight for each edge is the length of the street that connects them and a certain metric 
threshold (e.g. 5000m) we produce a network percolation by: 

1 . Selecting the transition of the graph with the smallest weight (distance), generating 
a new cluster and inserting both its nodes into the cluster. 

2. We will keep a first-in first-out queue of nodes to expand , from which we will extract 
a node to continue the process. We add both nodes of the transition selected in 
step 1 to this queue. Nodes are only added to this queue if they are not already 
included. 
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3. Extract a node from the queue of nodes to explore and if a transition departing from 
that node (not yet included in the cluster) is smaller than the threshold, include the 
transition in the cluster and the end node of the transition in the queue of nodes 
to explore. 

4. Repeat step 3 until no further node can be expanded (the queue is empty) and if 
there are transitions left in the graph that do not belong to any cluster, generate a 
new cluster by choosing the smallest available transition and repeat from step 1. 

This procedure will cover the complete graph with clusters, most of them irrelevant 
clusters of only a few nodes. To avoid this noisy behaviour we set a minimum size for a 
cluster of 75 nodes in order to include it in the set of percolation clusters. 

If we repeat this algorithm for a large set of distance thresholds (in the interval 
[5000,50] every 50m), the largest distance will produce one single cluster for the whole 
of the UK that includes every intersection. The following distance will produce a set of 
smaller clusters completely contained in the previous one, leaving behind a few intersec¬ 
tions. In fact, we can generate a tree of the percolation in this manner which renders the 
result portrayed in Figure [2} 

The set P of percolation clusters is the extended set that includes every cluster in 
this tree with m being the number of percolation clusters. Each Pj C P is a set of 
intersections that belong to the j-th percolation cluster such that UJli Pj = P an d 
UJLifal® G Pj} = L The set P is not a disjoint set, meaning that the same intersection 
will belong to several percolation clusters simultaneously as long as they have a parent- 
son relationship. That is, given two percolation clusters P 3 and Pj : , Pj fl Pp- ^ 0 if there 
exist a path in the percolation tree from Pj to Pp- (or viceversa) and otherwise Pj n Pp. = 0. 



Great 
» Britain 


5000m 



io E“2o5 

5® 1-2 s 

! (5 g’.S <um o 

:□ P CO 5 
I E Q) 

cn z 


300m 


Figure 2: To the left, the complete tree of the percolation using all the calculated thresh¬ 
olds and every cluster larger than 75 nodes. To the right, a simplified version of the 
percolation tree, generated by using only some selected thresholds and the largest 10 
clusters per distance presented to improve the understanding of the approach. 
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We will define the operation of intersecting a constituency with the set of percolation 
clusters as the operation that returns a vector (pi) with size m where each component 
j of the vector is the number of intersections that belong to the intersection of C % fl Pj. 
That is, Q fl P = Pi,Pij = ||{a:|x G C { Fl Pj}\\. 

This operation serves to generate a vector for each constituency that determines its 
composition in terms of the percolation clusters and that in turn, will serve to cluster the 
constituencies into similar behavioral groups according to the percolation. 

2.3 Clustering the percolation clusters following the parliamen¬ 
tary constituencies subdivision 

We will use the partitioning around medoids algorithm (PAM |llj ) to cluster the vectors 
Pi using the chi-squared distance m between them. In more detail, the distance between 

/ \ 2 

the constituencies C' t and Ck is d(f>i, pk) = \ + ^ fc J . This algorithm will cluster the 

constituencies into different sets according to their composition of percolation clusters. 
We will call the set that holds this set of disjoint clusters A, where each component 
A g C A is composed of several constituencies (A g = { C t , Ck, ■ ■ •}) such that Uj)=i 4 = ^ 
and A g fl Aj = 0 for all g and j. We can observe in Figure [3] the result of the applying the 
clustering algorithm to the space of the UK for different number of clusters. Throughout 
the rest of the paper we will use 60 clusters, given that it approximates about 10% of the 
actual number of constituencies which represents a reasonable compression limit. 



Figure 3: Partitioning around medoids of the constituencies in terms of their composition 
of network percolation clusters using {5,10, 30, 60} clusters. 


We can now extract a density of voting behaviour for each constituency that is given 
by the percolation by calculating the averaged voting behaviour of the densities of votes 
for each element of the set A. That is, given the g -th element of A, 


vUIA 
Z^i=l 




114 , 
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where fi* = —W 1 — is the density vector of the voting behavior of the 2 -th constituency 

Z^j = l v i,j 

belonging to the set A g . 

2.4 Performance of the clustering 

In order to quantify the quality of the partitioning of the constituencies set based on 
the percolation, we will study the error that it generates in comparison with several 
other types of partitioning. On one side, we will take into account several socio-economic 
variables from the 2011 census ra that could be relevant to the voting behaviour such as 
the degree of educational level (qualifications), the occupational class data (which serves 
as well as a proxy for income data), the age structure of the population and the country 
of birth (that can account for areas with a strong immigration rate). 

The data of the different variables is normalized by the total number of people that 
each variable accounts for. Qualifications is a vector with 5 categories: no-qualification, 
level 1, 2, 3 and 4. Occupational data is a vector that distinguishes between Managers, 
Professional, Associate Professional, Admin, Skilled Trades, Other Service, Sales, Process 
and Elementary. Age structure is another vector that has separated into components the 
number of people in each age group segregated by periods of 5 years and country of birth 
distinguishes between the categories UK, Ireland, Other EU, Other EU Accession and 
Rest of the World. We will later use the same partitioning mechanism to produce the 
clustering of the constituencies. 

As we can observe in Figure [4j the best clustering corresponds to the percolation, 
with the second best being occupational class data and the qualification variable which 
performs similarly. This shows how relevant it is the area to which one belongs, the level 
of connectivity that it has to other regions and how deep in the percolation tree a region 
is (cities have a larger depth than rural areas) to characterize voting behaviour. This 
can be explained relatively easy by considering the role that the inter-exchange of ideas 
between peers has on cultural patterns and that areas that are highly connected (or even 
belong to the same region) will have a wider range of migrant flows thus influencing each 
other’s way of thinking. The full extent of this analysis and how to be able to improve 
the results by simultaneously using the socio-economic and the geographical data will be 
treated in future work. 

In order to produce the plot, we will calculate the error as the sum of the distances be¬ 
tween the averaged voting vector of A g (v' g ) and the voting behaviour of each constituency 
Ci (v,:) included in the set A g . That is: 

error = \wi ■ if g — Vi\ 

Vg,Vi 


where Wi = JW v h:i is the total number of votes for constituency C t . 

We can also extract the winner for each constituency using these averaged vectors to 
get an approximate idea of how well they separate the space in terms of voting behaviours 
as shown in Figure [5] for all the studied set of socio-economic data and the percolation 
methodology. 
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Percentages of total error between averaged values of a cluster 
and real values of electoral data per constituency 



Type of data 

Electoral data 
Random 
* Percolation 
Qualifications 
Occupations 
Age structure 
Country of birth 


Figure 4: Plot of the percentage of total error for different number of clusters in which 
we are comparing the socio-economic data with the approach presented in this paper. 

2.5 Sub-trees 

We have produced the tree of the percolation using the full set of percolation thresh¬ 
olds from 5000m to 50m, but we could perform this procedure calculating the clusters 
from 5000 to 4950 and generating the tree for those thresholds, gradually decreasing the 
lowest threshold generating different sub-trees of increasing detail. We could then study 
their clusterings and their averaged voting behaviour and measure how much error each 
accounts for. 

Furthermore, we can produce the plot presented in Figure [6] for all the subtrees where 
the errors for the accumulated sub-trees are shown. In this Figure we can observe that 
there are 2 main thresholds where there are large decreases in the error thus producing 
local minima, exactly in the threshold of 1400 and in the 900, 800 range and in a smaller 
scale also in the 400, 300 range. Those 3 thresholds correspond respectively to 3 scales, 
the nation scale, the regional scale and the city scale that were represented in Figure [lj 

3 Predicting voting behaviours 

This entire approach is based on assuming that the percolation clusters identify a geo¬ 
graphical pattern from which voter behaviour can emerge as a consequence of nationalistic 
and regional attitudes that reflect how Britain is fracturing into its long standing histor¬ 
ical subdivisions. We should alert ourselves to the possibility that geographical factors 
are more of a determinant of the current volatility in voter attitudes than at any time in 
the last 100 years. To this end, we will examine how we might embed these geographical 
considerations into a simple model that is able to predict votes by combining the 2010 















Electoral winner 2010 Percolation Qualifications Occupations Age structure Country of birth 

Legend winner party 

■ Conservative ■ Liberal Democrat ■ Plaid Cymru ■ Speaker 

■ Labour | Scottish NationalParty ■ Green party ■ UKIP 

Figure 5: From left to right, actual results of the winners from the 2010 elections; winner 
extracted from the clustering based on the percolation; winner from the clustering of the 
qualifications variable; winner of the clustering from the occupational data; winners from 
age structure clustering and country of birth clustering. 


election results with our results from the percolation. 

In order to predict voting behaviour we use the the uniform national swing method [14J 
segregated by Scotland and England with Wales, which takes the following form: 

1. Using the votes vectors of the constituencies (v t ) we calculate the average votes for 
each party in the two areas (Scotland and England with Wales). 

2. Taking into account the percentages published in the polls by The New Statesman 
of the final results (http: //may2015. com) we produce a vector of swing votes from 
Scotland and England with Wales (.§) for the constituency C\). 

3. Using both vectors we can generate a new vector of predicted votes for each con¬ 
stituency Ci as vf = Vi + Si. 

In order to have control values for our methodology we use the predictions presented 
in http://www.electionforecast.co.uk/ which are shown in column Ac of table [l] 
and to ensure that our simple methodology is capable of generating valuable results, we 
produce a prediction based on the actual votes (?y) from the 2010 elections which is shown 
in column Bp. As we can see in the table the results are quite similar. 

We then proceed to apply this method to generate a prediction based on the per¬ 
colation. Instead of using the actual votes we use the averaged votes from the clusters 
generated with the percolation (v t = w, • ) and recalculate the swing votes to form col¬ 

umn Dp. Finally, we do the same for the clusters generated from the occupational data 
to generate column E P and later on, we calculate the average between the votes obtained 
with the percolation and the votes obtained with the occupational data, substitute u* to 
recalculate the swing votes and produce the output shown in column Cp. 

As we can observe, the result obtained with the percolation clusters overestimate 
the impact of the Labour Party while the occupational data produces the inverse effect. 
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Errors of accumulated subtrees 



Distance threshold 


Figure 6: Errors of the accumulated subtrees for 60 clusters. 

Using the average of both voting behaviours produces a map of constituencies that re¬ 
semble to a large extent the actual voting prediction showing that there is a underlying 
relationship between the voting patterns of people and their location in the network in 
close relationship with the occupational data. 


4 So What Will be the Outcome? 

In essence, our model does not pick up either the extremes of voting that the current 
polls are showing, nor does it produce the neck and neck race between the traditional 
parties. If the result of the election is as the recent YouGov polls suggest (see http: 


Parties 

A c 

B P 

Cp 

Dp 

E P 

Conservatives 

286 

283 

301 

260 

323 

Labour 

267 

276 

272 

311 

252 

SNP 

48 

49 

55 

56 

54 

Liberal Democrats 

24 

16 

0 

1 

0 

Plaid Cymru 

4 

3 

1 

1 

0 

Greens 

1 

1 

0 

0 

0 

UKIP 

1 

0 

0 

0 

0 

Speaker 

1 

1 

0 

0 

0 


Table 1: Voting predictions by number of seats. Ac prediction from http://www. 
electionforecast.co.uk/. B P Prediction based on the real voting vectors. Cp pre¬ 
diction based on the percolation and the occupational class data voting behaviour. D P 
prediction based on the percolation voting behaviour. E P prediction based in the occu¬ 
pational data. 


10 
















Figure 7: From left to right: (Bp) Prediction using the actual votes and the polls; (Cp) 
prediction by using the percolation and the occupational data; (Dp) prediction using 
solely the percolation; and (E P ) prediction using the occupational data. 


//www. electionf orecast. co.uk/) the Conservatives will gain 285 seats, Labour around 

270, the SNP 50 and the Liberal Democrats 25. This indeed would be a strange result 
by historical standards. It probably represents a hung parliament with no party able 
to win an outright majority and in fact no parties able to form a stable coalition. The 
closest that our model comes to forecasting this is with the Conservatives on 301, Labour 

271, the SNP 56 while the Liberal Democrats are erased from the map. But our most 
extreme prediction which still takes account of the geographical effects produces a much 
larger concentration of seats for the Labour Party which brings up the topic of how 
capable is the system of first pass the post to represent proportionally the number of 
votes taking into account that different partitioning of the constituencies produce very 
different results. This variation is present as well in the opposite range by the partition 
generated from the occupational data which agglomerate the constituencies in such a way 
that the Conservatives get a clear win. Strange times indeed. 

To an extent what we have developed here is a work in progress. We will only be 
able to refine our model, once the votes are known on May 7th 2015, when we will be 
able to undertake a much more considered analysis of geographical factors but we remain 
convinced that geographical isolation, separation, and connectivity is a key factor in 
determining not only how people vote but even how they think and it is this that would 
appear to be dictating the high volatility of current and more considered predictions. 
In fact in such a situation, there could well be a final bounce or shift, a transition to 
traditional or even more extreme or some combination of both when the voters take to 
polls and the votes are finally counted. 

Exciting times. 
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